In [1]:
import monk.core.api as ms
from monk.roles.configuration import default_config
In [2]:
config=default_config()
In [3]:
ms.initialize(default_config())
Out[3]:
In [4]:
ents = ms.convert_entities()
_raws : this is a field that stores temporary features for MONK. Keys are in string, values can be any objects._features : this is a field that stores the internal feature representations for MONK.creator, createdTime, lastModified : fields to keep version informationmonkType : type defined in MONKgeneric : convert a MONKObject to a json blob
In [5]:
ents[0].generic()
Out[5]:
In [6]:
ms.find_type('Turtle')
Out[6]:
In [7]:
ms.find_type('Panda')
Out[7]:
In [8]:
ms.find_type('Tigress')
Out[8]:
In [4]:
unigramTS = ms.yaml2json('turtle_scripts/turtle_unigram.yml')
In [5]:
unigramTS
Out[5]:
In [7]:
unigramT = ms.create_turtle(unigramTS)
In [8]:
stemTS = ms.yaml2json('turtle_scripts/turtle_stem.yml')
In [9]:
stemT = ms.create_turtle(stemTS)
In [26]:
stemT = ms.load_turtle('travel_stem','monk')
stemT.generic()
Out[26]:
In [27]:
for ent in ents:
stemT.predict(ent, fields)
load_entities(query={}, skip=0, num=100, collectionName=None)
Parameters:
query: MongoDB style query spec, e.g., {'tag':{'$in':['Shopping','Wine']}}.skip: the index to start the retrievalnum: the number of documents to retrievecollectionName: the name of the collection.
In [4]:
ents = ms.load_entities()
print len(ents)
ents[0].generic()
Out[4]:
In [7]:
fields=['title', 'comment', 'desc']
In [12]:
stemT.save()
Turtle provides a solution for a specific problem.
| Parameters | Definitions | Examples |
|---|---|---|
| monkType | Turtle type | MultiLabelTurtle |
| name | A unique string | flydragon_tagger |
| description | Detailed project description can help others understand | Used to predict first level tags for activities stored in flydragon |
| mapping | An encoder that encodes the targets into internal pandas structures. It is not useful for multilabel turtle. But, it is useful for multi-class Turtle. Encoding strategies can be very tricky since it will affect the final accuracies. For example, Error Correcting Output Code can be of a good choice, but a random binary coding can be bad. For Sum-Product-Network Turtle, smart coding will be learned from data, which will relieve the coding burden from scientists | {'dinning':[0000001000], 'hotel':[100000000]} |
| entityCollectionName | The data collection name to work on, assuming the database is given in MONK's configuration | activities |
| requires | The field defines the turtle's dependencies, or which features to use. It can be uids or turtle_ids. When turtle_ids used, the pandas' uids in those turtles will all be the features for this turtle | {'requires': {'turtleIds':[<id1>, <id2>]}} |
| pMaxPathLength | Inference will employ Beam Search algorithm, this parameter will be the maximum length of a search path before it stops | 1 |
| pMaxInferenceSteps | The maximum number of inferences before it gives up. | 1 |
| tigress | A tigress as the superviser to train the turtles and measure the performances | see below |
| pandas | A list of panda that are employed to solve the problem | see below |
Tigress provides functionalities to supervise and measure the performance of a turtle.
| Parameters | Definitions | Examples |
|---|---|---|
| monkType | Tigress type | MultiLabelTigress |
| name | A string | flydragon_tagger |
| description | Detailed discriptions | Equal weighted multilabel classifiers |
| costs | The cost of each tag/label being incorrectly predicted, if not defined, defaultCost will be used | {'dinning':1.0, 'boating':0.2} |
| defaultCost | If cost of a tag is not specified, defaultCost will be used. If defaultCost is not defined, it will be the smallest value in the costs | 1.0 |
| displayTextFields | The fields for the entity to be displayed when doing inter-active learning | ['title','reviews'] |
| displayImageField | If possible, display an image (url) | photo_url |
| activeBatchSize | For each active learning stage, how many uncertain examples to scan through, default to 100 | 100 |
| pCuriosity | The factor of the active learning to trade off between exploitation and exploration. 0.0 means no exploitation | 0.0 |
| patterns | For PatternTigress and children, each target has a pattern to search. If the pattern matches, the tag is on, otherwise the tag is off | {'dinning':'dinning'} |
| fields | In which fields, the tigress is supposed to search through | ['title', 'description', ...] |
| mutualExclusive | True for only one target existing in the fields, False otherwise | False |
| defaulting | True for using default tag when nothing found in the fields. False for ignoring this example | False |
Panda is a basic classifier/regressor.
| Parameters | Definitions | Examples |
|---|---|---|
| monkType | Panda type | LinearPanda |
| name | A unique string | dinning |
| mantis | A learning algorithm | see below |
Mantis is a basic learning algorithm.
| Parameters | Definitions | Examples |
|---|---|---|
| monkType | Mantis type | Mantis |
| maxNumIters | Maximum number of iterations to perform optimization | 100 |
| maxNumInstances | Maximum number of instances for each user | 1000 |
| eps | Convergence interval | 1e-4 |
| lam | Lambda that controls the regularization strength | 1 |
| rho | Personalization strength, the smaller the higher the personalization | 1 |
In [5]:
likeTS = ms.yaml2json('turtle_scripts/turtle_like.yml')
print likeTS
In [6]:
likeT = ms.create_turtle(likeTS)
In [7]:
likeT.save()
In [9]:
ent = ents[0]
In [32]:
ents[0].generic()
Out[32]:
In [11]:
ent._setattr('likeTravel', 'Y')
ms.crane.entityStore.save_one(ent)
In [12]:
ent.generic()
Out[12]:
In [10]:
likeT.pandas[0].mantis
Out[10]:
In [13]:
ms.add_data('likeTravel', 'monk', str(ent._id))
Out[13]:
In [14]:
likeT.tigress.p
Out[14]:
In [15]:
likeT.pandas[0].mantis.data
Out[15]:
In [16]:
ms.add_data('likeTravel', 'monk', str(ents[1]._id))
Out[16]:
In [17]:
likeT.tigress.defaulting=True
In [18]:
likeT.pandas[0].mantis.data
Out[18]:
In [21]:
likeT = ms.load_turtle('likeTravel','monk')
In [22]:
likeT.train()
In [40]:
from monk.math.cmath import sign0
In [ ]:
likeT = ms.load_turtle(turtleName, creator)
ent = ms.load_entity(ent_id)
return likeT.pandas[0].predict(ent)
In [25]:
likeT.pandas[0].predict(ents[0])
Out[25]:
In [ ]: